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Abstract 

Normal approximations for descents and inversions of permutations of the set {1,2, ... ,n} 
are well known. We consider the number of inversions of a permutation 7r(l), 7r(2), . . . , 7r(n) of a 
multiset with n elements, which is the number of pairs with 1 < i < j < n and 7r(z) > 7r(j). 
The number of descents is the number of i in the range 1 < i < n such that 7r(z) > Tr(i + 1). 
We prove that, appropriately normalized, the distribution of both inversions and descents of a 
random permutation of the multiset approaches the normal distribution as n — > oo, provided 
that the permutation is equally likely to be any possible permutation of the multiset and no 
element occurs more than an times in the multiset for a fixed a with < a < 1. Both normal 
approximation theorems are proved using the size bias version of Stein's method of auxiliary 
randomization and are accompanied by error bounds. 

1 Introduction 

Let 7r(l),7r(2), . . . ,7r(n) be a permutation of the multiset {l ni ,2 n2 , . . . ,h nh } with n\ + - ■ - + n/j = n. 
The number of inversions, denoted inv(7r), is defined as the number of pairs (i, j) with 1 < i < j < n 
and ir(i) > ir(j). The number of descents, denoted des(7r), is the number of positions i with 
1 < i < n and tx{%) > n(i + 1). Assume that tt is uniformly distributed. In this article, we use 
Stein's method to prove normal approximations with error bounds for inv(-7r) and des(-7r). 

In the special case where tt is a uniformly distributed permutation of the set {1, 2, . . . , n}, the 
distributions of both inv(-7r) and des(7r) admit simple descriptions. The distribution of inv(7r) is 
equal to that of the sum X\ + • • • + A n _i, where the random variables Aj, 1 < i < n — 1, are 
independent with Aj uniformly distributed over the set {0,1,..., i}. To obtain the distribution 
of des(7r), we need the sum X\ + ■ ■ ■ + A n _i + X n , where the Aj are independent and uniformly 
distributed in the interval [0, 1]. The probability that this sum lies in the interval [d, d + 1] equals 
the probability that des(-7r) equals d. According to Knuth the first of these two results was 
noticed by O. Rodriguez in 1839. The result about des(7r) was alluded to by Barton and Mallows 
[2] . An elegant proof is due to Stanley ^H] • 
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Normal approximations to des(7r) and inv(7r) in this special case can be obtained using these 
results and standard versions of the central limit theorem. The bounds 



P 



/ des(7r)-(n-l)/2 
^ V(n + 1)/12 
mv(vr)-iQ) 
y/n(n-l)(2n + 5)/72 



< x 



< X 



< 



< 



C 



n 



C 



n 



(1.1a) 
(1.16) 



where C is a constant and $ is the standard normal distribution, were proved using the method of 
exchangeable pairs ^HEOJ by Fulman [0]. Other proofs of Ijl.loj) using Stein's method are sketched 
in [U and [TO]. 

From the survey by Barton and Mallows [2] , it appears that the asymptotic normality of a quan- 
tity closely related to des(7r), where tt is a uniformly distributed permutation of the set {1, . . . , n}, 
was stated by Bienyame in 1874 (Bull. Soc. Math. France, vol. 2, p. 153-154). Bienyame was 
interested in statistical applications. So were Levene and Wolfowitz ^5] who stated that runs were 
widely used in quality control and in the study of economic time series. Runs are the monotone 
segments within a sequence of numbers and are closely related to descents. An early proof of the 
asymptotic normality of descents, which is implied by (jl.lq|) . is due to Wolfowitz |21j . 

Let {l ni , 2™ 2 , . . . , h Hh } be a multiset, where n a , 1 < a < h, are positive integers. Let n = 
n \ + n 2 + • • • + be the number of elements of the multiset. Let a be a fixed number in (0, 1). 
We assume that n a < an for 1 < a < h. Let tt be a uniformly distributed permutation of this 
multiset. We consider inv(-7r) and des(7r) in this more general situation. The bounds that we obtain 
for the errors in the normal approximations to these quantities depend upon a and become infinite 
as a — ► 1. Let h : R — > R be a bounded and piecewise continuously differentiable function and let 
(3 = max(l/2,a). We use the size-bias version of Stein's method introduced by Baldi, Rinott and 
Stein and prove that, for n large enough, 
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where C, C%, and C2 are some positive constants, $/i is the expectation of h with respect to the 
standard normal distribution, and \i and a 1 are the mean and variance of inv(-7r), respectively. If 
a > 1/2, then (3 = a. Therefore the bound above diverges as a — ► 1. We prove a similar result for 
des(-7r). 

Bounds such as the one given in the previous paragraph require h to be continuous. Goldstein 
|1U| has proved a normal approximation theorem that holds for non-smooth h. We use that theorem 
to prove that 
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These results are contained in Theorems 12 . 1 21 and 12 . 16l of this paper. The quantity C(f3) diverges 
when a — > 1. As before j3 = max(l/2, a). When the n elements of the multiset are distinct, with 
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n > 2, we may use a = 1/n and /3 = 1/2. Therefore the results stated above imply (jl.lgjl and 

The generating function of the number of permutations of a multiset with a given number of 
inversions is a rational function. Using this generating function, Diaconis has shown that the 
asymptotic distribution of inv(7r), where tt is uniformly distributed over permutations of a multiset, 
is normal. Theorem l2. 121 about inv(7r) is accompanied by an error bound of the correct order, which 
is 0(l/y/n), and the dependence of the error bound on a is also explicitly shown in our theorem. 
The generating function for the number of permutations of a multiset with a given number of 
descents, which is related to Foata's correspondence, was found by MacMahon |14j |16) . However, 
normal approximations to this quantity, such as the approximation given in Theorem I2.16[ do not 
seem to be available. 

Segments of vr(l), . . . , n(n) between successive descents, or runs, are in ascending order. Knuth 
|14j has stated that runs are important in the study of sorting algorithms because runs are segments 
that are already in sorted order. Among the applications of descents and inversions to the study 
of sorting algorithms, multiway merging with replacement selection merits special mention. In this 
sorting method, the given sequence is first split into runs and the runs are merged together. Our 
results are pertinent to sorting algorithms if the keys used for sorting are allowed to repeat. For 
example, Theorem 12. 161 about des(7r) gives an idea of how many runs to expect if multiway merging 
is used on a sequence of records with repeated keys. 

Descents and inversions have been used as test statistics in the special case where it is a per- 
mutation of {1, 2, . . . , n}. As already mentioned, early work on runs and descents was stimulated 
by statistical applications. Of the ten empirical tests for the randomness of a sequence of distinct 
numbers discussed by Knuth |13j . one is based on runs and descents. Taking our results into ac- 
count, inversions and descents can be used to test if a given permutation of a multiset of numbers 
is random. There are other ways to test if a given permutation of a multiset of numbers is random. 
If a permutation passes n empirical tests for randomness but fails the n + 1st, it is not random. 
Therefore having a greater number of empirical tests available makes for more robust testing |13| . 

DNA sequences are strings of the four letters A, C, G, and T. It is now well known that these 
sequences are far from random It has even been suggested that these sequences are similar to 
human languages fS]. Some commonly used compression algorithms such as the Lempel-Ziv method 
fail to compress typical DNA sequences however |12| . The entropy estimates of DNA sequences 
given in ^2j and |Hj proceed by dividing the sequence into blocks in some way. For instance, blocks 
of 6 consecutive letters are considered in |12| . These entropy estimates show that DNA sequences 
are not random. 

In Section 3, we report the descents and inversions of the 19th chromosome of the human 
genome mainly as an illustration. We consider all 24 possible orderings of A, C, G, and T. With 
respect to each of these orderings, a calculation of descents and inversions shows that the number 
of descents and inversions of the DNA sequence departs from the mean by a large multiple of the 
standard deviation. It may be of some interest that this method of showing the DNA sequence to 
be far from random considers only single letters without dividing them into blocks. 

Although we consider all possible orderings of A, C, G, and T, it must be noted that the 
molecular weights of the corresponding compounds implies the order C < T < A < G. This is as 
natural as any order one can hope to find among four physical objects. 

Our interest in permutations of multisets was provoked by their connection to riffle shuffles of 
decks with repeated cards 
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We do not give explicit numerical constants in our Theorems 12.121 and 12.161 about descents 
and inversions of permutations of multisets. It is worth noting that explicit numerical constants 
are not given for most of the detailed examples in Stein's book [20], and all the examples in the 
papers by Baldi, Rinott, and Stein pQ, by Goldstein and Rinott and by Rinott and Rotar |17j . 
Furthermore, even the asymptotic normality for descents of permutations of multisets implied by 
Theorem 12.161 is a new result, and so is Theorem 12. 121 which shows the dependence of the bounds 
for normal approximation on the size of the multiset and the parameter a that characterizes the 
multiset. 

2 Descents and inversions of permutations of multisets 

If W > is a non-negative and integrable random variable, the distribution of W* is said to be W- 
size biased, if E(W f (W)) = EWE(/ (W*)) for all continuous functions / for which the expectation 
on the left hand side of the equality exists. 

Stein's method JH] |20j refers to the use of auxiliary randomization to find normal approxima- 
tions to the distribution of some random variables. In the theorem below, the auxiliary random- 
ization requires the construction of W* which must be W-size biased. The theorem below can be 
found in PP, but we follow its formulation in jllj . 

Theorem 2.1. Let W be a non-negative random variable with EW = [i and V&r(W) = a 2 . Let 
W* be jointly defined with W such that its distribution is W-size biased. Let h be a function from 
R to R such that h is continuous and its derivative Dh is piecewise continuous. Then 

\Eh(^—^j - $h\ < 2\\h\\ -^Var(E(W* - W\Wj) + \\Dh\\-^E(W* - W) 2 , 

where &h is the expectation of h with respect to the standard normal distribution and \\-\\ is the 
supremum norm. 

When h is the indicator function of the half line (— oo,x], the following theorem found in JOJ 
applies. Its proof uses a smoothing inequality and other techniques found in [T7| . 

Theorem 2.2. Let W be a non-negative random variable with EW = \i and Vai(W) = a 2 . Let 
W* be jointly defined with W such that its distribution is W-size biased. Let \W* — W\ < B and 
let A = B/a. Let B < a 3/2 /^/6Jl. Then 

l P (^V^ - x ) - - 0AA + ^( 64 ^ 2 + 4aS ) + ^^/Var(E(iy* -W\W)), 

where is the standard normal distribution. 

In Theorems 12. II and 12.21 above, we added the superscript * to W to denote a random variable 
with the Il^-size biased distribution. In the lemma below, random variables Xi with the superscript 
* do not necessarily have the Xj-size biased distribution. Here and later, our convention is to use 
the superscript * when random variables are constructed as a part of the size biasing procedure. 
This notation is due to pQ. 

The construction of size biased variables in this paper will be based on the following lemma 
found in P and [TT] . 
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Lemma 2.3. Let W = X\ + X 2 + . . . + X n , where each Xi is a non-negative random variable 
with finite mean. Let I be a random variable which is independent of the Xi and which satisfies 

p(j = i) = EXi/ YTj=i EX j- De fi ne w * as w * = x l + x 2 + • • • + K> where f° r s iven 1 x i has 

the X 1 -size biased distribution and 

P((XlXl...,X*)eA\l = i,X*=x)=P((X 1 ,X 2 ,...,X n )eA\X i = x). (2.1) 
Then W* has the W -size biased distribution. 



Whenever Lemma 12.31 is applied here, we will find Xi are 0-1 valued random variables, and the 
size biased distribution for such variables is concentrated at 1. Therefore, for our purposes, (|2.1|) 
can be written as P((Xf, X$, . . . , X*) G A\l = i) = P((Xi,X 2 , . . . ,X n ) G A\X t = 1). 

Let 7r be a uniformly distributed permutation of the multiset {l ni , 2™ 2 , . . . , h nh } and n = ri\ + 
n2 + ... + rifi. Each n a , 1 < a < h, is a positive integer. The symbols i,j,k,l, with and without 
numerical subscripts, are used to index the set {1,2,..., n}. The symbols a, b, c, d are used to index 
the set {1, 2, . . . , h}. We also assume n a < an for 1 < a < h and for some a in (0, 1), n > 4, and 
h>2. 

Define Xij, for i < j, as 1 if ir(i) > ir(j) and as otherwise. Some facts about the joint 
distribution of Xij will be necessary. Denote the probabilities 

P(Xij = 1) with i < j, 

PpQjj = l,Xij 2 = 1) with i < ji and i < j 2 , 
P(X il j = l,X i2j = 1) with %x < j and i 2 < j, 
P(X ik = l,X kj = 1) with i<k<j, 

P(X hjl = l,X i2j2 = 1) with h < j u i 2 < j 2 , and (h,ji) / (12,32) 

by Pii P2, P3, Pi, and p$, respectively. Elementary arguments can be used to deduce formulas, such 
as pi = J2 a <b n a n b/(n( n - 1)) and p 4 = J2 a <b<c n a n bn c / \n(n - l)(n - 2)), for p 1 , p 2 , p 3 , p 4 , and 
P5. From such formulas, we deduce 
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The formulas in ()2.2|) will be used to derive expressions for Var(inv(7r)) and Var(des(7r)). 

The assumption n a < an, for some a G (0, 1), is used in the two lemmas below. The lemmas, 
however, are worded in terms of (3 = max(l/2, a) and use the weaker assumption n a < (3n for 
P G [1/2, 1). In both the lemmas the assumption n a < f3n implies h > 2. 

Lemma 2.4. Assume j3 G [1/2, 1), n a > for all a, and Yla n a = n - U n a < 0n for 1 < a < h, 
then 2/3(1 - (3)n 2 < n 2 - £ n\ < n 2 and 3/3(1 - f3)n 3 <n 3 -J2 a n l^ ^ ■ 
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Proof. To lower bound n 2 — ^ a n 2 , note that x > y > 0, 5 > 0, and y — 5 > imply (x + S) 2 + (y — 
5) 2 > x 2 + y 2 . Thus for a given sum cc + y, the quantity x 2 + y 2 increases when the difference x — y 
is increased. Thus given ^ a n a = n and the constraints n a > 0, the quantity ^ a n 2 is increased 
whenever two positive numbers are chosen from n a , 1 < a < h, and the lesser of them is decreased 
and the greater increased by the same amount. Therefore, under the constraints n a < fin, n 2 
is maximum when n\ = fin, ri2 = (1 — (3)n, and n a = for a > 2. The lower bound for n 3 — ^ a n 3 
is also obtained when n\ = fin, n% = (1 — /3)n, and n a = for a > 2. The upper bounds are 
trivial. □ 

Concerning the lemma below, it is worth noting that /3 4 — 4/3 4 + 4/3 — 1 = (1 — fi) 2 {fi 2 + 2fi— 1) > 
for G [1/2,1). 

Lemma 2.5. Assume fi G [1/2, 1), n a > /or a// a, and ^ a n a = n. 7/n a < fin for 1 < a < h, 
(/3 4 - 4/3 2 + 4/3 - l)n 4 < n 4 /3 + ( ^ n 2 ) 2 - (4n/3) ^ n 3 < n 4 /3. 



Proof. The upper bound follows from the inequality n^2 a n^ > (X^a n a) 2 - 

We prove the lower bound assuming (3 > 1/2. The proof for (3 = 1/2 can be obtained with 
minor changes. The proof will make careful use of the Kuhn- Tucker conditions as explained in [3J 
Theorem 9.2-3]. 

We attempt to minimize J(ni, rt2, • • • , rih) = {Y2a n a} 2 ~ (^ n /^)Yla n a SUD j ec t to the affine 
constraints J^a n « = n ' ~ n a — 0, and n a — (3n < 0, where the last two constraints hold for 
1 < a < h. We assume n\ > ri2 > • • • > > without loss of generality. 

Let -D J be the gradient vector whose ath entry is 

The sum of the entries of DJ must be because the term 4n a nb(nb — n a ) in d,J/dn a is canceled 
by the term An a rn>{n a — n&) in dJ/dn a . If there exists an a such that m > n a > 0, then the first 
entry of D J must be strictly negative and therefore some other entry must be strictly positive. 

Let u G R h be the vector with all entries equal to 1. Let v a G R h be the vector with its ath 
entry equal to —1 and all other entries equal to 0. Let w\ = —v\. Note that n a — fin = is possible 
only if a = 1 as we have assumed fi > 1/2 and n\ > ri2 > • • • 

Suppose (ni,?i2, . . . ,nh) is a local minimum of J. The Kuhn- Tucker conditions require that it 
must be possible to make all entries of DJ zero by adding multiples of certain vectors. We are 
always allowed to add any real multiple of u because the constraint ^ a n a = n is always in force. 
We are allowed to add a positive multiple of v a if and only if n a = because the constraint — n a < 
can then be violated by making an infinitesimal change to n a . We are allowed to add a positive 
multiple of w\ if and only if m — fin = by a similar reason. 

Let us first consider the type of local minimum where the Kuhn- Tucker conditions can be 
satisfied without adding a positive multiple of w\. Suppose n\ > n a > for some a for such a local 
minimum. Then the first entry of DJ is strictly negative and some other entry is strictly positive. 
If the positive entry is dJ/dnf,, then n^ must be nonzero and therefore > 0. Such a DJ cannot 
be made zero by adding a multiple of u and positive multiples of v c corresponding to n c = 0. The 
only way to make the bth entry of DJ equal to is by adding a negative multiple of u. But this 
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means the first entry remains negative and nonzero, and the only way to make it is by adding a 
multiple of w\ which is not allowed by assumption. Therefore any local minimum of this type must 
have n\ = 11,2 = ■ ■ ■ = n s = n/s and n c = for c > s, where 2 < s < h. The value of J at such a 
point is — n 4 /(3s 2 ). Since s > 2, 

J > -n 4 /12 (2.3) 

at any local minimum of this type. 

We next consider the type of local minimum where it is necessary to add a positive multiple 
of w\ to satisfy the Kuhn- Tucker conditions. At such a local minimum n\ = [5n and n\ > n% > 
• • • > n h > 0. Suppose rih = 0. Then the first entry of DJ is strictly negative, some other entry 
is strictly positive, and the last entry is 0. We cannot make all those three entries zero by adding 
a real multiple of u, a positive multiple of wi, and a positive multiple of Vh to DJ. Thus > 0. 
Next suppose that the oth entry of DJ is not equal to the 6th entry of DJ for some a, 6 > 1. It 
is impossible to make the 1st, ath, and 6th entries of DJ zero by adding a multiple of u and a 
positive multiple of w%. Therefore, all entries of DJ except the first must be equal. The expression 
for dJ/dn a given above is quadratic in n a . Thus we may conclude that at any local minimum of 
this type n\ = f3n and n>2, ■ ■ ■ , Uh can take on at most two different values. Although the argument 
assumed h > 3, the conclusion holds when h = 2 as well. When h = 2 and a positive multiple of w\ 
is added to DJ to satisfy the Kuhn- Tucker conditions, we must have n\ = (5n and n2 = (1 — (3)n. 

We now consider the value of J assuming that n\ = /3n, that x of the n a s equal n x , that y of 
the n a s equal n y , and that xn x + yn y = n(l — (3). We also assume that x is a positive integer, that 
y is a non-negative integer, that x > y, and of course that n x and n y are non-negative. Then 

J = (/3 4 n 4 _ 4/? 3 n 4 /3) + (xn 2 x + yn 2 y f + 2f3 2 n 2 {xn 2 x + yn 2 y ) - (4n/3)(xn 3 x + yn^), 

which we will think of as a sum of four terms. If follows from elementary inequalities that the 
minimum of xn 2 + yn 2 under the given constraints is n 2 (l — (3) 2 j{x + y), and that the minimum 
of — (4n/3)(xn x + yn y ) occurs when n x = n(l — (3), x = 1, and n y = 0. We can minimize each of 
the four terms of J separately to obtain 

J > /3 4 n 4 - 4/3V/3 - 4(1 - /?) V/3. (2.4) 

The value of J at any local minimum of the type discussed in the previous paragraph must either 
equal or exceed the lower bound in (|2.4j) . 

So far, we have proved that the value of J at a local minimum satisfies the lower bound 
given by either ()2.3|) or (|2.4|) . depending upon the type of the local minimum. For (3 £ [1/2, 1), 
(3 4 — 4/3 3 /3 — 4(1 — /3) 3 /3 < —1/12 by an elementary argument. Therefore the lower bound for J 
given by (|2.4|) holds at all local minima and the lower bound for J + n 4 /3 stated in the lemma is 
proved. 

□ 



2.1 Inversions of permutations of multisets 

Let W = Yli<j Xij- Then W = inv(7r). We assume that ir is uniformly distributed over permuta- 
tions of the multiset {l ni ,2 n2 , . . . , h nh }. 

Lemma 2.6. Let fj, = EW and a 2 = Var(W). Then 
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Proof. Since \x = W)P\, where p\ = 'EXij, and p\ is given by (|2.2j) . the expression for /u in the 
lemma must hold. 
We first show that 

<? 2 = Q (Pi - Pi) + 2 (3) (Pa +P3 +P4 - 3p?) + 6^ (p B - P 2 i), (2.5) 

where the pi are given by (|2.'2|l . If Var(W) with W = 5^t<j -^y * s wr itten as a sum of variances 
and covariances of the X^j , there are (n) variance terms each of which is equal to p\ — p\ . There 
are Q) terms of the form 2 Covar(Xy 1 , Xij 2 ) with i < j\ < 32 and each of those is equal to 
2{p2 — Pi). We can account for terms of the form 2 CovarpQ^-, X^ 2 j) with i\ < i 2 < j and of the 
form 2 Covar(Xik, Xkj) with i < k < j similarly. Thus far we have explained the first two terms 
of (|2.5[) . All the other terms in the expansion of Var(W) are of the form 2 Covax(Xj i j 1 , Xi 2 j 2 ) with 
h < 3li h < 32i and (h,ji) < {12, 32) in lexicographic order. The last term of (|2.5|) follows if we 
note that the number of such terms is 3 m • 

The expression for a 2 in the lemma is deduced using (|2.2jl . 1)2.5(1 . and the two inequalities 

Ea n a < n2 and Ea n a < « 3 - □ 

We now turn to the construction of the size biased variable W* required by Theorems 12.11 and 
12.21 Let I be uniformly distributed over all pairs with 1 < i < j < n and let it be independent 
of 7r. Let J = (a, b), for h > a > b > 1, with probability n a rib/ Yl,c<d n c n d, and let J be independent 
of both 7r and /. Now ir* is constructed from tt, I, and J as follows. If I = and ir(i) > vr(j'), 

then 7r* = tt. If / = vr(z) < 7r(j) and J = (a, 6), 7r* is constructed in the following steps: 

1. Let i* and j* be uniformly distributed over the sets {i|7r(i) = a} and {j = b}, respec- 
tively. They must be independent of each other and all other random variables. 

2. If {i,j} H = (j), or i = i*,j ^ j* , or i ^ i*,j = j* , exchange 7r(«) with vr(z*) and ir(j) 
with n(J*) to get 7r*. 

3. If i = = i* , exchange ir(i) and to get tt* . 

4. If i = j*,j / i* j then 7r*(z) = 7r(i*), 7r* (j) = 7r(j*) = tt(z), 7r*(i*) = 7r(j), and ir*(k) = ir(k) if 
k / i, 

5. If i ^ j*,j = i*, then tt*(*) = 7r(**) = vr(j), vr*(j) = vr(j*), 7r*(j*) = 7r(i), and vr*(A;) = 7r(k) 
for fe / i,j,j*. 

Finally, W* = Yji<j X tji where X*j is 1 if > ir*(J) and otherwise. 

We prove below that W* has the W-size biased distribution. If n were a uniformly distributed 
permutation of {1,2,... it would be enough to exchange 7r(z) and 7r(j) if ir(i) < ir(j) to get 
tt* . The resulting H^* would have the VF-size biased distribution. However, since we are dealing 
with a multiset here, ir(i) = 7r(j) is also a possibility. The construction of tt* given above is not as 
simple mainly because this possibility has to be dealt with. 

The following lemma is needed to prove that W* has the H^-size biased distribution. Subtraction 
and union of multisets have the obvious meanings in the statement of the lemma. The lemma is 
stated without proof. 
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Lemma 2.7. Letir be a uniformly distributed permutation of the multiset {l™ 1 , 2™ 2 , . . . ,h nh }. If one 
a out of n a possible choices is chosen uniformly from tt and changed to b, the resulting permutation 
is a uniformly distributed permutation of the multiset ({l ni , 2 n2 , . . . , h nh } — {a}) U {&}. Similarly, if 
one of n a as and one of n& bs are picked uniformly and independently from tt and changed to c and 
d, respectively, then the resulting permutation is a uniformly distributed permutation of a possibly 
new multiset. 

Lemma 2.8. The random variable W* has the W -size biased distribution. 

Proof. By Lemma 12.31 it is enoug h to show that P(vr* G A\l = (i,j)) = P(vr G A\ir(i) > vr(j)). 
Now 

P(vr* G A\I = = P(vr G A\ix{i) > vr(j))P(vr(i) > vr(j)) 

+ P(vr* G A\ir{i) < n(j),I = (i,j))P(n(i) < tt(j)). 

The first term in the right hand side of the equation above is not conditioned on I because P (tt* G 
A \7r(i) > 7r(j),I = (i,j)) = P(tt G A|7r(i) > ir(j),I = (i,j)), by the construction of tt*, and because 
7r is independent of /. Thus, if we can show P(-7r* G Aj 7r(z) < vr(j),/ = (i,j)) = P(vr G ^4 1 vr(i) > 
Tt(j)) , the proof will be complete. 

The proof is completed by the sequence of equalities below and the explanation that follows 
them. 

P(n* eA\n(i)<Tr(j),I = (i,j)) 

= ]Tp(tt* g A\n(i) <Tt(j),I = (i,j),J= (a,6))P(7r(i) = a,vr(j) = b\n(i) > ir(j)) 

a>b 

= ^P(vr G A\n(i) = a,7r(j) = b, I = (i, j), J = (a, 6))P(?r(i) =a,ir(j) =b\ir(i) > ir(j)) 

a>b 

=P(tt G A\n(i) > 7r(j)). 

The first equality is true because J is independent of ir and /, and P(J = (a, 6)) = P(jr(i) = 
a i 7r(j) = b \ir(i) > 7r(j)) . The construction of ir* from it, I, J and Lemma 12.71 imply the second 
equality. More specifically, we note that Lemma 12 . 71 implies that given 7r(«) < 7r(j), / = and 
J = (a,b), the arrangement 7r*(l), tt*(2), . . . ,7r*(n) with the ith and the jth numbers struck out is 
a uniformly distributed permutation of the multiset {l ni , 2" 2 , . . . , h nh } — {a, b}. □ 

We now focus on finding a useful upper bound for Var(E(VF* - W\ir)). Given a sequence of 
numbers s\, S2, ■ ■ ■ , s p , we throw g and r into the same set if and only if s q = s r . In this way, we 
get a partition of {1,2, ... ,p} into sets, and we may arrange the sets of the partition so that the 
values of s q for q in the set increase. We refer to such an ordered partition of {1, 2, ... ,p} as the 
relative order of s\, S2, ■ ■ ■ , s p . For our purpose, it is sufficient to note that the number of possible 
relative orders is bounded by 2 p p\. 

Lemma 2.9. Let Pi be the probability that 7r(l), vr(2), . . . ,w(p) occur in a certain relative order 
when tt is a uniformly distributed permutation of {l Ul , 2™ 2 , . . . , h Uh }, and let that probability be P2 
ifirisa uniformly distributed permutation of the multiset {l n i , 2 n 2, . . . , h n h}. Assume that n a > n' a , 
J2a( n a ~ n 'a) — 5. We allow n' a = 0. If p < 5 then \P± — P2 < C/n /or some constant C . 
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Proof. The proof is obtained by writing down formulas for P\ and Pi- We show the proof for the 
relative order 7r(l) < 7r(2) < • • • < 7r(p). 

Let n' = ^2 a n' a . The probability Pi is given by 

n ai n a 2 ■ ■ ■ n a v ^ 



n(n — 1) . . . (n — p + 1) ' 



where the sum is taken over 1 < a\ < a% < ■ ■ ■ < a p < fo. The formula for P2 is obtained by adding 
a prime to all the ns in (|2.6|) , Now 



Pi _p 2= T t n ai n a2 ...n ap -n , ai n , a2 ...n' ap _ p L_ n'(n' - 1) . . . (n' - p + 1) \ 
n(n — 1) . . . (n — p + 1) I n(n — 1) . . . (n — p + 1) y 

< n — n' < 5, and 

n ai n a2 . . . Haj, - n' ai n' a2 . . . n' ap < n ai n a2 . . . n ap ((n ai - n' ai )/n ai +■■■ + (n ap - n' ap )/n ap ), 
together imply \Pi — P%\ < C/n. □ 

Lemma 2.10. Let f(jr(l), 7r(2), . . . , 7r(p)) and g(ir(p + 1), ir(p + 2), . . . , ir(p + q)) be functions that 
depend only upon the relative order of their argument lists. Assume that \f\, \g\, p, and q are all 
upper bounded by 5. If n is a uniformly distributed permutation of the multiset {l m , 2 n2 , . . . , h nh }, 
then 

Covar (/(tt(1), vr(2), . . . , n(p)) , g(ir(p + 1), ir(p + 2),...,n(p + q))) | < C/n 
for some constant C. 

Proof. It is enough to consider / and g to be indicator functions that are 1 for a certain relative 
order of their argument lists and for all other relative orders. All other / and g are linear 
combinations of a constant number of indicator functions with coefficients that are bounded by 
constants. 

We state the proof assuming / and g are 1 if their arguments are in strictly increasing order 
and otherwise. Let P(/ = 1) = P 1 and P(g = 1) = P 2 . Then 

P(fg = 1) = £ P(ir(l) < tt(2) < • • • < 7r(p)|7r(p + 1) = a 1} . . . ,7r(p + q) = a q ) 

P(n(p+1) =a 1: ...,n(p + q) =a q ), 



where the sum is over 1 < a% < ai < • • • < a q < h. By the previous Lemma 12. 9( each condi- 
tional probability in the sum above is Pi + 0(l/n). Therefore, P(fg = 1) = P1P2 + 0(l/n) and 
Covar(/,<7) = 0(l/n). □ 

Lemma 2.11. 

Cn 5 

Vav(E(W* - W\tt)) < 



(n 2 - £a n l) 2 



for some constant C. 
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Proof. Ifir(t) >ir(j), E(W* - W\ir, I = = 0. If vr(i) <ir(j), 

1 ™ 

e(w* - w\*, i = (ij)) = EEwm^Ao, 

where (i*,j*) takes all J2a>b n a n b possible values with 7r(i*) > vr(j*) and ip n (i, j,i* , j* ,1) is the 
change in the number of inversions between position I and positions when vr(i), vr(j'), 

7r(i*), 7t(j*) are exchanged to construct ir*. Note that |^| < 4. We now have 

n 

E (VT - W|tt) = -j^ E E^(w*>-?'*>0, ( 2 -7) 

where i,j take all values satisfying 1 < i < j < h and ir(i) < ir(j), and where take values as 
already indicated. 

We use ()2.7|) to write Var(E(W* — W\tt^ as a sum of variance and covariance terms. The 
number of variance terms is bounded by n 5 . The number of covariance terms 



Covar(Vv(ii, ji, *i J^i), V^2, J2, ^ J2V2)) (2.8) 

with i*,jt , Zi} n {^2, 32i i|> ^2} ^ is fewer than 25n 9 . Since < 4, the contribution of 

the variance terms and covariance terms with the property just described is bounded by 16(n 5 + 
25n 9 )/((™)^(ra 2 — X]a n a)) • We have used J2 a>b n a nb = \{n 2 — ^2 a n 2 ) to obtain this bound. 

Covariance terms of the form ()2.8|) with {i%,ji,i*,jx,h} H {iiijii *I>J2> ^2} = ^ rema i n to be 
considered. The number of such terms is fewer than n 10 . Lemma 12.101 can be applied to argue 
that such covariances are 0(l/n) as we may use the fact that tt is uniformly distributed to assume 
hiji, i*i jii h = 1)2,3,4,5 and i-2iji-,i\i3\ili = 6,7,8,9,10 with no loss of generality. The proof 
can now be easily completed. □ 

Theorem 2.12. Let it be a uniformly distributed permutation of the multiset {l ni , 2™ 2 , . . . , h nh }, 
where n a £ Z + for 1 < a < h. Assume that a £ (0, 1) is fixed and that n a < an for 1 < a < h. 
Let f3 = max(l/2, a). Let h : R — > R be a bounded continuous function with bounded piecewise 
continuous derivative Dh. Then for n > no((3), 

. invW-j. x / iw ii^n ^ 



a 



(3{1 -/?)(/?(! -/?)nV2 -Cm- 1 / 2 ) (/3(1 - /?)nV3 _ c 2n -2/3)3/2 



where C, C\, and C2 are some positive constants, &h is the expectation of h with respect to the 
standard normal distribution, and fi and a 2 are the mean and variance ofmv(n), respectively. 
If C(/3) is allowed to depend upon j3, we may assert 



a 



for some positive constant C{(3). 
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Proof. Let W = inv(vr). By Lemmas l2~il and l2~fil a 2 > - /5)/12)n 3 + <3(n 2 ) and u < n 2 /A. By 
Lemmas E31 and EH2 Var (E(W* — W|7r)) < Cnj(fi{\ — (3)) 2 for some constant C. By construction 
of the size biased variable W*, \W* - W\ < An, and therefore E(W* - W) < 16n 2 . If we note 
that Var(E(Ty* - W^W)) < Vav(E(W* -W\ir)), Theorem l2~T1 can be applied to prove the first 
part of this theorem. 

The second part is proved using Theorem 12.21 By construction of W*, \W* — W\ < An. There- 
fore we can take B = An. The inequality B < <7 3 / 2 /v / 6/I must hold for large enough n by bounds 
for <7 and /i given above. □ 

2.2 Descents of permutations of multisets 

Let W = X\2 + X23 + • • • + X n -i tn . Then W = des(7r), with tt uniformly distributed over permu- 
tations of the multiset {l ni , 2" 2 , . . . , h Uh }. 

Lemma 2.13. Let u = EW and a 2 = Var (TV). Then 

» = ^T^ ^ ° 2 = V An(i -I) 2 + 



Proof. Since fj, = (n— l)pi, where pi = EXij, and p\ is given by (|2.2j) . the expression for fi in the 
lemma must hold. 
We first show that 

a 2 = {n- l)(p! - p\) + 2(n - 2)(p 4 - p\) + (n - 2)(n - 3)(p 5 - p?), (2.9) 

where the pi are given by (|2.2jl . If Var(W), with W = X12 + X23 + • • • + X n _i jri , is written as 
the sum of variances and covariances of the Xij+i, there are (n — 1) variance terms, each equal 
to pi — p\. There are (n — 2) covariance terms of the form Covar(Xj ) j + i, each equal to 

P4 — p\. The remaining covariance terms are all equal to p§ — p\. 

The expression for a 2 in the lemma is deduced using (|2.2jl . (|2.9|) . and the two inequalities 
Ea n a < n2 and ^ Q n 3 < n 3 . □ 

The construction of the size biased variable W* is the same as the construction for inversions 
given immediately after Lemma 12.61 with the following differences. The random variable I must 
be equal to one of (1, 2), (2, 3), . . . , (n — 1, n) with equal probability. In the construction of ir*, the 
symbol j must be replaced everywhere by i + 1. Finally, W* = X± 2 + X% 3 + • • • + X* where 
X*j is 1 if iT*(i) > 7r* (j) and otherwise. 

Lemma 2.14. The random variable W* has the W '-size biased distribution. 

Proof. Similar to the proof of Lemma 12.81 □ 
Lemma 2.15. 



Cn 

Var(E(iy* - W\tt)) < ^— 



for some constant C . 
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Proof. By arguing as in the proof of Lemma 12. 1H we get 

E(W* -W\*) = n{n 2_ 2 Eanl) E E Mi, )• (2-10) 

In (|2.1()|) . i takes all values such that < vr(z + 1), takes all values such that ir(i*) > vr(j*), 

and ip n (i, i* , j*) = des(7r*) — des(7r), where it* is constructed by exchanging 7r(i+l), 
as described. Note that \ip w \ < 7. 

We use (l2~TUl) to write Var(E(W* - W\tt)) as the sum of variance and covariance terms. There 
are 0(n 3 ) variance terms of the Var(^/> 7r ). The number of terms of the form 

Covar(V>7r(n,«i, j*),ip7:(i2,i^32)), ( 2 - n ) 

where one of the numbers differs from one of the numbers {12,^2^2} by 3 or less in 

magnitude is 0(n 5 ). The magnitude of such covariance terms and of the variance terms is bounded 
by 49. The number of covariance terms of the form (|2.11j) where none of the numbers 
differs from any one of the numbers {12, 1%, j%} by 3 or less in magnitude is 0(n 6 ). By Lemma l2.10| 
the magnitude of such covariance terms is 0(l/n). The proof is now easily completed. □ 

It is worth noting again that /? 4 - 4/3 2 + A(3 - 1 = (1 - [3) 2 {f3 2 + 2(3 — 1) > for (3 G [1/2, 1). 

Theorem 2.16. Let it be a uniformly distributed permutation of the multiset {l ni , 2 n2 , . . . , h nh }, 
where n a G Z + for 1 < a < h. Assume that a G (0, 1) is fixed and that n a < an for 1 < a < h. 
Let [3 = max(l/2, a). Let h : R — > R be a bounded continuous function with bounded piecewise 
continuous derivative Dh. Then for n > no(f3), 

/des^-M c / 



a J \f3{l-(3)yfti{(3*-4,(3 2 + A(3-l-C 1 n- 1 ) 

\\Dh\\ 



+ 



((/3 4 - 4/3 2 + 4/? - l)nV3 - c 2n -2/3)3/2 



where C , C\, and C2 are some positive constants, $/i is the expectation of h with respect to the 
standard normal distribution, and fi and a 2 are the mean and variance o/des(7r), respectively. 
If C(f3) is allowed to depend upon (3, we may assert 



( des(7r) 



\ a 

for some positive constant C{(3). 

Proof. Let W = des(vr). By Lemmas l2~H 1231 and [2~T3l a 2 > {(f3 A - 4fi 2 + A(3 - l)/4)n + 0(1) and 
H < n/2. By Lemmas EH and EH Yav(E(W* - W\ir)) < C/(n(3 2 {l - (3) 2 ) for some constant C. 
By construction of the size biased variable W*, \W* - W\ < 7, and therefore E(W* - W) < 49. 
If we note that Var(E(W* - W\W)) < Yar(E(W* -W\ir)), Theorem Q can be applied to prove 
the first part of this theorem. 

The second part is proved using Theorem l2.21 By construction of W*, \ W* — W\ < 8. Therefore 
we can take B = 8. The inequality B < <7 3 / 2 /y / 6/I must hold for large enough n by bounds for a 
and fi given above. 

□ 
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A 


C 


G 


T 


A 


4229414 


2833985 


4154304 


3165323 


C 


4221129 


4044958 


1057112 


4150574 


G 


3423863 


3180474 


4056078 


2846197 


T 


2508620 


3414357 


4239118 


4260148 





A 


C 


G 


T 


A 


103435711266825 


94175991781325 


94404662110136 


103982892949612 


C 


99617649978799 


90771286164651 


90984870248490 


100143945584446 


G 


99861289457776 


91000167345198 


91214277105966 


100388853252680 


T 


103452603097706 


94178097170636 


94406787118036 


104000539364403 



Table 1: The first table above reports the number of occurrences of ir(i) = x and ir(i + 1) = y. 
The second table reports the number of occurrences of tt(i) = x and Tt(j) = y, with i < j. The 
permutation tt corresponds to chromosome 19, and x and y can be A, C, G, or T. 

3 Descents and inversions of the human genome 

The human genome consists of 24 chromosomes, each of which is a sequence of bases labeled A, C, 
G, or T. The 19th chromosome has the following counts for the four bases (see [B]): 

n A = 14383026 n c = 13473774 n G = 13506612 n T = 14422243. 

The version of the human genome reported in [£] has 341 gaps. The 19th chromosome has only 
three gaps in the middle. We ignored these gaps when counting the number of inversions and 
descents. 

From Lemmas 12.61 and 12.131 and their proofs, we find the expected number of descents and 
inversions to be fi d = 2.0912146861 x 10 7 and ^ = 5.8329890505 x 10 14 , respectively. The standard 
deviations are a d = 2.0871959423 x 10 3 and a { = 6.7231321079 x 10 10 . Data about the 19th 
chromosome reported in Tabled can be used to calculate the number of descents and inversions for 
any ordering of A, C, G, and T. By Theorems 12 . 1 21 and 12.161 the number of descents and inversions 
must have a distribution that is close to the normal distribution if n is a uniformly distributed 
permutation of the bases in the 19th chromosome. The number of descents and inversions in the 
19th chromosome itself is reported in Table [21 for all possible orderings of A, C, G, and T and with 
suitable normalization. From each line of this table, we may infer that the null hypothesis stating 
the 19th chromosome to be a random permutation of its bases is very unlikely to hold. 

Estimations of the entropy of DNA sequences can be found in ^2j and [Hj. Those estimates 
too imply that DNA sequences are far from random. We note that Tabled assumes the number of 
As, Cs, Gs, and Ts to be given and computes a statistic to test if their arrangement in a sequence 
is random. This is a different notion of randomness from that of entropy. For instance, it is 
possible for a sequence to have A for 90% of its letters which would mean that the sequence can 
be significantly compressed. Yet the arrangement of the letters could be generated randomly. 

In the bounds given by Theorems l2.12l and l2.16l the constants C{(3) are not determined explicitly. 
In this example n is greater than 5 x 10 . For the large departures from the mean that are seen in 
Table |21 it is reasonable to assume that the probabilities of finding such departures, if the sequence 
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Order 


(des -(id)/<7d 


(inv-^/o-j 


Order 


(des -H d )/a d 


(inv-Hi)/<ri 


A,C,G,T 


36.13 


-11.64 


C,A,G,T 


-628.47 


-92.58 


G,A,C,T 


-631.23 


-93.03 


A,G,C,T 


-981.20 


-11.86 


C,G,A,T 


-278.50 


-173.74 


G,C,A,T 


-1295.83 


-173.96 


T,C,A,G 


-628.47 


93.03 


C,T,A,G 


-981.20 


4.29 


A,T,C,G 


-278.50 


166.08 


T,A,C,G 


36.13 


173.96 


C,A,T,G 


-1295.83 


-3.60 


A,C,T,G 


-631.23 


77.34 


A,G,T,C 


-628.47 


76.87 


G,A,T,C 


-278.50 


-4.29 


T,A,G,C 


-981.20 


173.74 


A,T,G,C 


-1295.83 


165.85 


G,T,A,C 


36.13 


3.60 


T,G,A,C 


-631.23 


92.58 


T,G,C,A 


-1295.83 


11.64 


G,T,C,A 


-628.47 


-77.34 


C,T,G,A 


-631.23 


-76.87 


T,C,G,A 


-278.50 


11.86 


G,C,T,A 


-981.20 


-166.08 


C,G,T,A 


36.13 


-165.85 



Table 2: This table reports the normalized number of descents and inversions of the 19th chromo- 
some, when the orders shown in the first and the fourth columns are considered increasing. 

were a uniformly distributed permutation, are less than .001. Such a bound is implied in most 
cases by Chebyshev's inequality. Yet even this is surely an overestimate. For uniformly distributed 
7r, the probabilities that des(7r) and inv(7r) depart from their means by a certain amount appear to 
fall off at least as fast as the bell curve does away from zero. Therefore, for large deviations from 
the mean, the bounds given by Theorems 12.121 and 12. 161 are not accurate and better bounds would 
be desirable. 
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